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A  Statistical  Viewpoint  on  the  Theory  of  Evidence 


Robert  Hummel 
Michael  Landy 


Abstract 


We  describe  a  viewpoint  on  the  Dempster/Shafer  "Theory  of  Evi- 
dence", and  provide  an  interpretation  which  regards  the  combination 
formulas  as  statistics  of  the  opinions  of  "experts".  This  is  done  by 
introducing  spaces  with  binary  operations  that  are  simpler  to  inter- 
pret or  simpler  to  implement  than  the  standard  combination  formula, 
and  showing  that  these  spaces  can  be  mapped  homomorphically  onto 
the  Dempster/Shafer  theory  of  evidence  space.  The  experts  in  the 
space  of  "opinions  of  experts"  combine  information  in  a  Bayesian 
fashion.  We  present  alternative  spaces  for  the  combination  of  evi- 
dence suggested  by  this  viewpoint. 


1.   Introduction 

Many  problems  in  artificial  intelligence  call  for  assessments  of  degrees  of  belief 
in  propositions  based  on  evidence  gathered  from  disparate  sources.  It  is  often 
claimed  that  probabilistic  analysis  of  propositions  is  at  variance  with  intuitive 
notions  of  belief  [7, 17,  19].  Various  methods  have  been  introduced  to  reconcile  the 
discrepancies,  but  no  single  technique  has  settled  the  issue  on  both  theoretical  and 
pragmatic  grounds. 

1.1.  Theory  of  Evidence 

One  method  for  attempting  to  modify  probabilistic  analysis  of  propositions  is 
the  Dempster/Shafer  "Theory  of  Evidence."  This  theory  is  derived  from  notions  of 
upper  and  lower  probabilities,  as  developed  by  Dempster  in  [5].  The  idea  that 
intervals  instead  of  probability  values  can  be  used  to  model  degrees  of  belief  had 
been  suggested  and  investigated  by  earlier  researchers  [9, 13, 17,31],  but  Dempster's 
work  defines  the  upper  and  lower  points  of  the  intervals  in  terms  of  statistics  on 
set-valued  functions  defined  over  a  measure  space.  The  result  is  a  collection  of 
intervals  defined  for  subsets  of  a  fixed  labeling  set,  and  a  combination  formula  for 
combining  collections  of  intervals. 

Alternative  theories  based  on  notions  of  upper  and  lower  probabilities  were 
also  pursued  [13,33],  and  can  be  formally  related  to  the  updating  formulas  used  in 
the  Dempster/Shafer  theory  [19],  but  are  really  a  separate  formulation. 

Dempster  explained  in  greater  detail  how  the  statistical  notion  from  his  earlier 
work  could  be  used  to  assess  beliefs  on  propositions  in  [6].  In  [4],  Dempster  gave 
examples  of  the  use  of  upper  and  lower  probabilities  in  terms  of  finite  populations 
with  discrete  univariate  observable  characteristics,  in  correspondence  with  algebraic 
structures  to  be  discussed  later  in  this  paper.    The  topic  was  taken  up  by  Shafer 
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[26,27],  and  led  to  publication  of  a  monograph  on  the  "Theory  of  Evidence,"  [28]. 
All  of  these  works  after  [4]  emphasize  the  values  assigned  to  subsets  of  propositions 
(the  "beliefs"),  and  the  combination  formulas,  and  de-emphasize  the  connection  to 
the  statistical  foundations  based  on  the  set-valued  functions  on  a  measure  space. 

The  Dempster/Shafer  theory  of  evidence  has  sparked  considerable  debate 
among  statisticians  and  "knowledge  engineers".  The  theory  has  been  criticized  and 
debated  in  terms  of  its  behavior  and  applicability,  e.g.  [6,21,24,33  (Commentaries 
following)].  Some  of  the  questions  have  been  answered  by  Shafer  [29,30],  but  dis- 
cussion of  the  theoretical  underpinnings  continues,  e.g.,  [7,18,19].  A  related,  but 
distinct  theory  of  lower  probabilities  is  frequently  discussed  as  another  alternative 
for  uncertain  reasoning  [13,33].  An  excellent  study  by  Kyburg[19]  relates  the 
Dempster/Shafer  theory  to  a  lower  probability  framework,  where  beliefs  are  viewed 
as  extrema  of  opinions  of  experts.  These  viewpoints  have  similarities  to  the  one 
developed  here,  but  differ  in  the  interpretation  of  belief  values. 

Recently,  there  has  been  increased  interest  in  the  use  of  the  Dempster/Shafer 
theory  of  evidence  in  expert  systems  [11,14].  Most  of  the  recent  attempts  to  map 
the  theory  to  real  applications  and  practical  methods,  such  as  described  in 
[2,8,10,15,32],  are  based  on  the  "constructive  probability  "  techniques  described 
by  Shafer  [29],  and  disregard  the  statistical  theoretical  foundations  from  which  the 
theory  was  derived.  The  constructive  theory  is  based  on  a  notion  of  fitting  particu- 
lar problems  to  scales  of  canonical  examples.  In  the  case  of  belief  functions,  the 
cornerstone  of  the  Dempster/Shafer  theory,  Shafer  offers  a  set  of  examples  of 
"coded  messages"  being  sent  by  a  random  process,  and  a  set  of  measures  on  belief 
functions  to  assist  in  fitting  parameters  of  the  "coded  message"  example  to  instances 
of  subjective  notions  of  belief.  While  the  "coded  message"  interpretation  is  an 
essentially  statistical  viewpoint  and  isomorphic  to  the  algebraic  spaces  discussed 
here  and  implicit  in  Dempster's  work,  the  proposed  fitting  scheme  attempts  to  apply 
alternate  interpretations  to  the  combination  formula  based  on  subjective  similarities. 

In  this  paper  we  present  a  viewpoint  on  the  Dempster/Shafer  theory  of  evi- 
dence that  regards  the  theory  as  statistics  of  opinions  of  "experts".  We  relate  the 
evidence-combination  formulas  to  statistics  of  experts  who  perform  Bayesian  updat- 
ing in  pairs.  In  particular,  we  show  that  the  Dempster  rule  of  combination,  rather 
than  extending  Bayesian  formulas  for  combining  probabilities,  contains  nothing 
more  than  Bayes'  formula  applied  to  boolean  assertions,  but  tracks  multiple  opin- 
ions as  opposed  to  a  single  probabilistic  assessment.  Finally,  we  suggest  a  related 
formulation  that  leads  to  simpler  formulas  and  fewer  variables.  In  this  formulation, 
as  in  the  Dempster  combination  formula,  the  essential  idea  is  that  we  track  the 
statistics  of  the  opinions  of  a  class  of  opinions.  However,  in  our  new  formulation, 
the  opinions  are  allowed  to  be  probabilistic,  as  opposed  to  the  boolean  opinions  that 
are  implicit  in  the  Dempster  formula. 

The  author's  interest  in  the  Dempster/Shafer  theory  of  evidence  derives  from  a 
study  of  a  large  class  of  iterative  knowledge  aggregation  methods  [20].  These 
methods,  which  include  relaxation  labeling  [16],  stochastic  relaxation  [12]  neural 
models  [1],  and  other  "connectionist  networks,"  always  attempt  to  find  a  true  label- 
ing by  updating  a  state  as  evidence  is  accumulated.  In  the  theory  of  evidence,  as  in 
many  other  models,  the  true  labeling  is  one  of  a  finite  number  of  possibilities,  but 
the  state  is  a  collection  of  numbers  describing  an  element  in  a  continuous  domain. 
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In  the  Shafer  formulation,  the  state  of  the  system  is  described  by  a  distribution  over 
the  set  of  all  subsets  of  the  possible  labels.  That  is,  each  subset  A  of  labels  has 
assigned  to  it  a  number  representing  a  kind  of  probability  that  the  subset  of  possible 
labels  is  precisely  A.  Implicit  in  this  model  is  the  notion  that  an  incremental  piece  of 
evidence  carries  a  certain  amount  of  weight  or  confidence,  and  distinguishes  a  sub- 
set of  possibilities.  Evidence  may  point  to  a  single  inference  among  the  set  of 
labels,  or  may  point  to  a  subset  of  the  alternatives  (see,  e.g.,  [23]).  As  evidence  is 
gained,  belief  values  are  updated  according  to  a  combination  formula.  The  combi- 
nation formula  is  commutative  and  associative,  so  a  succession  of  incremental 
changes  can  be  combined  into  a  single  state  that  can  be  regarded  as  a  non-primitive 
updating  element. 

Most  of  the  other  iterative  models  for  combining  evidence  represent  the  degree 
of  support  for  a  label  by  a  single  number,  although  there  may  be  additional  numbers 
in  a  state  vector  corresponding  to  "hidden  units."  For  the  state  of  belief  in  the  for- 
mulation discussed  above,  there  are  numbers  for  every  subset  of  labels.  Thus  if 
there  are  n  labels,  a  state  has  (roughly)  2"  values.  That  is,  there  are  many  addi- 
tional degrees  of  freedom.  Further,  not  all  iterative  models  have  associative  combi- 
nation formulas.  Commutivity  is  even  more  problematic,  since  there  is  often  a  dis- 
tinction between  the  current  state  of  belief,  and  the  form  of  representation  of  incre- 
mental evidence.  The  Dempster/Shafer  formulation  is  somewhat  special,  in  that  evi- 
dence is  represented  by  a  second  state  of  belief  to  be  combined,  on  an  equal  basis, 
with  a  current  state  of  belief. 

1.2.  Theory  of  Belief  Functions 

In  this  section,  we  amplify  on  the  distinction  between  the  viewpoint  established 
in  the  remainder  of  this  paper  and  theory  of  belief  functions,  as  used  in  the 
Dempster/Shafer  theory  of  evidence. 

The  canonical  examples  from  which  belief  functions  are  to  be  constructed  are 
based  on  "coded  messages"  c\,  .  .  .  ,c„,  which  form  the  values  of  a  random  process 
with  prior  probabilities  p  i,  .  .  .  ,p„;  [24].  Each  message  c,  has  an  associated  subet 
At  of  labels,  and  carries  the  message  that  the  true  label  is  among  A/.  The  masses 
representing  the  current  state  are  simply  the  probabilities  (with  respect  to  this  ran- 
dom process)  of  receiving  a  message  associated  with  a  subset  A.  The  belief  in  a  sub- 
set A  is  the  probability  that  a  message  points  to  a  subset  of  A. 

The  coded-message  formulation  corresponds  exactly  with  our  space  of  boolean 
opinions  of  experts  (section  3.1).  Moreover,  the  combination  of  coded  messages 
and  the  combination  of  elements  in  the  space  of  boolean  opinions  coincide.  Specifi- 
cally, given  a  random  process  of  messages  c\,  .  .  .  ,c„  with  priors  p\,  .  .  .  ,pn,  and 
another  process  of  messages  c  i ' ,  .  .  .  ,  cm ' ,  with  priors  p  \ ' ,  .  .  .  ,pm ' ,  then  in  com- 
bination a  pair  of  codes  are  chosen  independently  (c,-,c/),  thus  with  prior  probabil- 
ity piPj' ,  and  the  associated  message  is  that  the  truth  lies  in  Aif)Aj' .  There  c,  car- 
ries the  message  At,  and  c,  carries  messages  Aj.  It  is  our  point,  in  introducing  the 
spaces  of  experts,  that  the  requisite  independence  includes  not  only  the  choice  of 
messages,  but  also  an  assumption  that  the  message  is  formed  by  the  intersection  of 
the  subsets  designated  by  the  constituent  messages.  As  opposed  to  being  tautologi- 
cal, this  intersection  involves  a  conditional  independence  assumption,  a  point  that 
we    emphasize    by    treating    the    formulation    as    algebraic    structures,     and    by 
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considering  the  space  of  probabilistic  opinions  of  experts. 

In  a  sense,  our  space  of  boolean  opinions  of  experts  can  be  thought  of  as  an 
alternative  set  of  canonical  examples  with  which  to  construct  states  of  belief  to 
analogous  real  situations.  Necessarily,  these  examples  will  be  isomorphic  to  any 
other  set  of  canonical  examples,  and  only  the  language  used  to  describe  the  same 
algebraic  spaces  varies.  However,  there  is  additional  richness  in  the  various  classes 
of  canonical  examples,  since  many  distinct  examples  might  correspond  to  an  identi- 
cal state  of  belief.  By  "backing  up"  to  the  richness  of  the  space  of  probabilistic 
opinions  of  experts,  we  are  better  able  to  interpret  the  foundations  of  the  Dempster 
rule  of  combination,  and  suggest  the  alternative  formulation  that  is  presented  in  the 
second  part  of  this  paper  (see  Figure  2). 

When  the  theory  of  belief  functions  is  actually  applied  to  evidential  reasoning 
situations  with  uncertain  evidence,  the  belief  function  is  typically  regarded  as  a  vari- 
ant on  a  probability  measure  over  the  set  of  labels  [25].  An  important  difference  is 
that  the  belief  function  is  not  an  additive  measure.  Nonetheless,  the  belief  on  a  par- 
ticular label  is  identified,  in  some  subjective  way,  with  a  probability  for  that  label, 
except  that  degrees  of  uncertainty  are  allowed  to  withhold  "mass"  to  non-singleton 
subsets.  In  the  commentaries  to  Shafer's  presentation  of  the  theory  of  belief  func- 
tions and  example  applications  before  the  Royal  Statistical  Society  [24],  several  dis- 
cussants commented  on  the  need  for  a  closer  connection  between  the  canonical 
examples  and  the  interpretation  of  belief  values.  Professor  Barnard,  for  example, 
states  that  "the  connections  between  the  logical  structure  of  the  ...  example  and  the 
story  of  the  uncertain  codes  is  not  at  all  clear."  Professor  Williams  desires  "a 
deeper  justification  of  the  method  and  a  further  treatment  of  'unrelated  bodies  of 
evidence,'  "  while  Professor  Krantz  states  simply  that  "comparison  of  evidence  to  a 
probabilistically  coded  message  seems  strained."  Professor  Fine  summarizes  the 
problem  by  stating  that  "the  coded  message  interpretation  is  ignored  when  actually 
constructing  belief  functions,  calling  into  question  the  relevance  of  the  canonical 
scales." 

We  believe  that  the  viewpoint  expounded  here,  and  the  analytic  treatment  of 
algebraic  spaces  embodying  the  combination  formula  for  belief  functions,  substan- 
tially answers  these  calls  for  elucidation  of  the  meaning  of  belief  functions.  At  the 
very  minimum,  our  spaces  provide  canonical  examples  where  belief  values  can  be 
regarded  as  percentages  of  sets  of  experts  stating  that  possible  labels  are  restricted 
to  within  a  specified  subset.  We  believe,  however,  that  the  viewpoint  reduces  the 
need  for  subjective  balancing  between  a  given  probabilistic  situation  and  a  "coded 
message"  interpretation,  and  instead  provides  a  way  in  which  belief  values  can  be 
estimated  by,  for  example,  sampling  techniques.  The  crucial  point,  (and  presum- 
ably essential  to  the  notion  of  uncertainty),  is  that  uncertainty  is  measured  over  a 
different  sample  space  than  the  labeling  situation;  in  our  parlance,  the  separate  sam- 
ple space  is  a  set  of  experts.  Further,  the  viewpoint  that  evidence  can  be 
represented  by  collections  of  opinions  or  the  statistics  on  a  collection  of  opinions 
leads,  fairly  naturally,  to  alternate  representations  from  the  space  of  belief  states 
used  in  the  Dempster/Shafer  formulation.  Given  the  fundamental  simplicity  of  the 
parameterized  statistics  space  that  we  introduce  in  Section  5,  we  believe  that  the 
viewpoint  yields  structures  for  evidential  reasoning  that  might  well  be  applicable 
when  neither  Bayesian  probabilistic  reasoning  nor  theories  of  belief  functions  are 
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suitable. 

Belief  functions  are  generally  viewed  as  extensions  of  probability  measures 
over  the  set  of  labels.  When  all  masses  occur  on  singleton  subsets,  then  the  belief 
function  is  an  additive  measure,  and  combination  of  such  elements  yields  a  formula 
equivalent  to  Bayes'  formula  with  conditional  independence.  Since  more  general 
belief  functions  are  allowed,  the  Dempster  combination  formula  is  regarded,  from 
this  viewpoint,  as  an  extension  of  Bayes'  formula. 

From  the  point  of  view  of  statistics  of  opinions  of  experts,  as  developed  here, 
the  Dempster  combination  formula  is  explained  by  Bayesian  updating  on  boolean 
opinions  in  all  cases.  The  special-case  Bayes'  formula  is  explained  as  follows. 
When  masses  are  concentrated  on  singletons,  then  each  expert  is  naming  a  single 
label.  Suppose  that  the  percentage  of  experts  naming  a  particular  label  is  the  same 
as  the  actual  probability  for  that  label  given  the  information  available  to  the  experts. 
This  is  an  ergodicity  assumption,  since  chances  are  being  compared  over  two  distinct 
sample  spaces,  the  set  of  experts  and  the  space  of  labeling  situations.  Then  the 
independent  sampling  of  a  pair  of  experts  from  each  of  two  such  collections  of 
experts  mimics  the  independent  probabilistic  assessment  of  conditioning  on  multiple 
hypotheses. 

To  what  extent  can  the  various  viewpoints  coexist?  As  alternative  scales  of 
canonical  examples,  there  is  no  conflict  between  opinions  of  experts  and  coded  mes- 
sages. However,  the  viewpoint  that  regards  the  masses  and  beliefs  as  probabilities 
of  boolean  random  variables  defined  on  a  sample  space  of  experts,  distinct  from  the 
sample  space  of  labeling  situations,  seems  to  give  additional  intuitive  insight,  as 
stated  by  Professor  Kingman  in  the  same  commentaries  to  [24].  Further,  as  we 
emphasize  here,  this  viewpoint  is  isomorphic  to  the  structures  for  combining  evi- 
dence, modulo  the  terminology.  But  in  order  to  reconcile  a  view  of  beliefs  as  pro- 
babilities over  sets  of  experts  with  a  view  of  beliefs  as  extensions  of  probability 
measures  over  labels,  some  kind  of  ergodicity  assumption  is  needed  to  relate  distri- 
butions over  the  different  spaces.  It  may  well  be  that  such  assumptions  can  be  for- 
mulated to  give  a  deeper  theoretical  basis  for  the  application  of  canonical  examples 
to  probabilistic  situations  with  uncertainties.  An  advantage  would  be  that  judge- 
ments of  the  applicability  of  the  formulation  could  be  based  on  the  validity  of  the 
assumption  as  opposed  to  the  quality  of  empirical  results.  However,  we  do  not  pur- 
sue such  a  plan  here,  preferring  to  view  uncertainty  as  a  measure  of  concurrence  of 
multiple  opinions. 

1.3.   Objectives 

This  paper  has  three  main  points.  First,  we  formulate  the  space  of  belief  states 
as  an  algebraic  structure,  pointing  out  in  the  process  that  the  normalization  term  in 
the  Dempster  rule  of  combination  is  essentially  irrelevant.  Our  reason  for  treating 
these  much-debated  and  motivated  concepts  in  terms  of  mathematical  structures 
such  as  semigroups  and  monoids  is  to  follow  Dempster's  early  admonition  to  avoid 
becoming  "sidetracked  into  doctrinaire  questions  concerning  whether  probabilities 
are  frequencies,  or  personal  degrees  of  belief,  or  betting  probabilities,  etc.,"  [4]. 
Having  formulated  the  Dempster/Shafer  theory  of  evidence  as  a  simple  algebraic 
structure,  we  can  discuss  interpretations  in  terms  of  their  isomorphic  relationship  to 
the  theory. 
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We  then  describe  spaces  that  we  call  probabilistic  and  boolean  opinions  of 
experts.  Our  intent  is  to  survey  the  foundations  of  the  Dempster/Shafer  theory  in  a 
manner  more  accessible  than  the  original  Dempster  works,  and  in  a  way  that  makes 
clear  the  relationship  to  Bayesian  analysis.  The  key  point  here  is  that  rather  than 
extending  Bayes'  formula,  the  combination  method  is  simply  applying  Bayes'  for- 
mula to  sets  of  boolean  opinions,  updating  on  product  sets  of  those  opinions.  The 
idea  of  a  class  of  opinions,  rather  than  a  single  probabilistic  current  opinion,  occurs 
in  the  theory  of  lower  probabilities  [13],  and  is  the  theme  of  a  unifying  treatment  of 
evidential  reasoning  in  [22].  In  the  theory  of  evidence,  the  opinions  are  boolean- 
valued,  giving  lists  of  possible  labels,  and  the  state  of  the  system  is  described  by  the 
statistics  of  these  opinions.  In,  for  example,  a  medical  diagnosis  application,  the 
range  of  opinions  might  be  held  by  different  doctors,  and  the  opinions  themselves 
consist  of  list  of  possible  pathologies.  The  important  distinction  between  measuring 
statistics  over  the  set  of  doctors  and  over  the  set  of  patients  forms  the  basis  for 
measuring  degrees  of  uncertainty. 

Finally,  we  use  the  viewpoint  established  by  these  spaces,  or  canonical  exam- 
ples, to  introduce  the  main  original  contribution  of  this  paper.  We  use  the  space  of 
probabilistic  opinions  of  experts  to  define  spaces  that  we  call  parameterized  statis- 
tics of  opinions.  The  idea  and  use  of  these  spaces  to  tasks  of  evidence  is  fundamen- 
tally simple:  a  probabilistic  opinion  is  maintained  and  updated,  as  in  Bayesian 
analysis  with  conditional  independence,  and  a  concurrent  measure  of  uncertainty  is 
maintained  in  terms  of  a  multivariate  Gaussian  distribution  in  log-probability  space. 
Once  again,  we  have  the  idea  of  a  spread  of  opinions,  but  founded  on  notions  of 
Bayes'  theorem  for  updating,  and  with  the  connections  to  the  Dempster/Shafer 
theory  made  clear. 

2.  The  Rule  of  Combination  and  Normalization 

The  set  of  possible  outcomes,  or  labelings,  will  be  denoted  in  this  paper  by  A. 
This  set  is  the  "frame  of  discernment",  and  in  other  works  has  been  denoted,  vari- 
ously, by  ft,  ©,  or  S.  For  convenience,  we  will  assume  that  A  is  a  finite  set  with  n 
elements,  although  the  framework  could  easily  be  extended  to  continuous  label  sets. 
More  importantly,  we  will  assume  that  A  represents  a  set  of  states  that  are  mutually 
exclusive  and  exhaustive.  If  A  is  not  initially  exhaustive,  it  can  easily  be  made  so  by 
including  an  additional  label  denoting  "none  of  the  above."  If  A  is  not  mutually 
exclusive,  it  can  be  made  so  by  replacement  with  its  power  set  (i.e.,  the  set  of  all 
subsets),  so  that  each  subset  represents  the  occurrence  of  exactly  that  subset  of 
labels,  excluding  all  other  labels.  Of  course,  replacing  A  by  its  power  set  is  peri- 
lous, in  that  it  will  greatly  expand  the  cardinality  of  the  label  set.  For  practical 
applications,  the  implementer  is  more  likely  to  want  to  replace  A  by  the  set  of  all 
plausible  subsets  describing  a  valid  configuration. 

An  element  (or  state  of  belief)  in  the  theory  of  evidence  is  represented  by  a 
probability  distribution  over  the  power  set  of  A,  P(A).  That  is,  a  state  m  is 

m  :P(A)  -[0,1], 

2>(A)  =  1. 

There  is  an  additional  proviso  that  is  typically  applied,  namely  that  every  state  m 
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satisfies 

m(0)  =  0. 

Section  3.2  introduces  a  plausible   interpretation   for  the   quantities   comprising  a 
state. 

A  state  is  updated  by  combination  with  new  evidence,  or  information,  which  is 
presented  in  the  form  of  another  state.  Thus  given  a  current  state  mi,  and  another 
state  m2.a  combination  of  the  two  states  is  defined  to  yield  a  state  mi$m2  given 
by 

2     m1(S)m2(C) 

(wiem2)(A)-    /^     Wl(i>)m2(C)       **«.  da) 


Bnc=0 


and 


(mi  ©m2)(0)  =  0. 

This  is  the  so  called  "Dempster  Rule  of  Combination."  Note  that  the  resulting 
function  m  is  a  probability  mass  due  to  the  normalization  factor,  and  that 
(mi  ©  m2)(0)  =  0  by  definition. 

The  problem  with  this  definition  is  that  the  denominator  in  (la)  might  be  zero, 
so  that  (mi  ©  m2)(A)  is  undefined.  That  is,  there  exist  pairs  m\  and  m2  such  that 
the  combination  of  mi  and  mi  is  not  defined.  This,  of  course,  is  not  a  very  satisfac- 
tory situation  for  a  binary  operation  on  a  space.  The  solution  which  is  frequently 
taken  is  to  avoid  combining  such  elements.  An  alternative  is  to  add  an  additional 
element  mo  to  the  space: 

m0(A)  =  0   for   A  *  0, 

mo(0)  =  1. 

Note  that  this  additional  element  does  not  satisfy  the  condition  m(0)  =  0.  Then 
define,  as  a  special  case, 

mi©m2  =  m0    if       2     mi(fi)m2(C)  =  1.  ,^\ 

flnc=0 

The  binary  operation  is  then  defined  for  all  pairs  m\,  m2.  The  special  element  mo 
is  an  absorbent  state,  in  the  sense  that  mo©m  =  mefrmo  =  mo  for  all  states  m. 

This  space  has  an  identity  element.  The  identity  state,  m/,  represents  complete 
ignorance,  in  that  combination  with  it  yields  no  change,  (i.e.,  mi&m  =  m®mi  =  m, 
for  all  states  m).   This  state  places  full  mass  on  the  subset  which  is  all  of  A, 

m/(A)  =  1 
w/(A)  =  0   for   A  #  A. 

Definition  1:  We  define  (A4,©),  the  space  of  belief  states,  by 

M  =  {m:P(A)  -R  +  U{0}|  £  m(A)  =  1,  m(0)  =  0}  U  {m0}, 

AQA 

and   define    ©  by   (la)    when   the   denominator   in    (la)    is   nonzero,   and   by    (lb) 
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otherwise.  ■ 

The  set  M ,  together  with  the  combination  operation  6,  constitutes  a  monoid, 
since  the  binary  operation  is  closed  and  associative,  and  there  is  an  identity  ele- 
ment.1 In  fact,  the  binary  operation  is  commutative,  so  we  can  say  that  the  space  is 
an  abelian  monoid. 

Still,  because  of  the  normalization  and  the  special  case  in  the  definition  of  ©, 
the  monoid  M  is  both  ugly  and  cumbersome.  It  makes  better  sense  to  dispense  with 
the  normalization.   We  have 

Definition  2:   We  define  (M  ',©').  the  space  of  unnormalized  belief  states ,  by 
M'  =  {m  :  P(A)-R+  U  {0}  |    2  m(A)  =  1} 

A£A 

without  the  additional  proviso,  and  set 

(mi©'m2)(A)=      2    m1(B)-m2(C)   VACA  (2) 

BdC=A 

for  all  pairs  m  i ,  m  2  €  M ' .  ■ 

One  can  verify  that  m\0d'  m^^M' ,  and  that  0'  is  associative  and  commutative. 
Further,  the  same  element  mj  defined  above  is  also  in  M' ,  and  is  an  identity.  Thus 
M'  is  also  an  abelian  monoid.    Clearly,  M'  is  a  more  attractive  monoid  than  M. 

We  define  a  transformation  V  mapping  M'  to  M  by  the  formulas 

(Vm)(A)  =    ■  mW      ,  (3) 

l-m(0) 

(Vw)(0)  =  0 
if  m(0)  *  1,  and 

Vm  =  mo 
otherwise. 

A  computation  shows  that  V  preserves  the  binary  operation;  i.e., 

V(m1e'm2)  =  V(wi1)©V(m2). 

Thus  V  is  a  homomorphism ?  Further,  V  is  onto,  since  for  miM,  the  same  m  is  in 
M' ,  and  Vm  =  m.  The  algebraic  terminology  is  that  V  is  an  epimorphism  of 
monoids,  a  fact  that  we  record  in 

Lemma  1:   V  maps  homomorphically  from  (M  ',©')  onto  (A4,©).   ■ 

A  "representation"  is  a  term  that  refers  to  a  map  that  is  an  epimorphism  of 
structures.  Intuitively,  such  a  map  is  important  because  it  allows  us  to  consider 
combination   in  the  space  formed  by  the  range   of  the  map   as  combinations   of 


'A  structure  with  a  closed  associative  binary  operation  is  sometimes  call  a  semigroup,  so  that  the  space  in 
question  is  an  abelian  semigroup  with  an  identity. 

2Strictly  speaking,  this  merely  shows  that  V  is  a  homomorphism  of  semigroups;  it  is  not  hard  to  show  that 
V  maps  the  identity  to  the  identity,  which  it  must  since  it  is  onto,  and  thus  it  is  also  a  homomorphism  of 
monoids. 
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preimage  elements.  Lemma  1  will  eventually  form  a  small  part  of  a  representation 
to  be  defined  in  the  next  section.  In  the  case  in  point,  however,  if  it  is  required  to 
combine  elements  in  M,  one  can  perform  the  combinations  in  M ' ,  and  project  to 
M  by  V  after  all  of  the  combinations  are  completed.  Since  combinations  in  M '  are 
much  cleaner,  this  is  a  potentially  useful  observation.  In  terms  of  the 
Dempster/Shafer  theory  of  evidence,  this  result  says  that  the  normalization  in  the 
combination  formula  is  essentially  irrelevant,  and  that  combining  can  be  handled  by 
Equation  (2).  Specifically,  given  a  sequence  of  states  in  M  to  be  combined,  say 
mi,rri2,  '  ■  •  ,mk,  we  can  regard  these  states  as  elements  in  M' .  Since  each  m, 
satisfies  m,(0)  =  0,  they  each  satisfy  Vm,  =  m,.   Thus 

V(mi©'/H2e'  •  •  •  ®'mk)  -  Vmi©  •  •  ■  ©Vmt  =  mi©  •  •  •  ©m*,  which  says 
that  it  suffices  to  compute  the  combinations  using  ©'  (Equation  (2)),  and  then  pro- 
ject by  V  (Equation  (3)).  Of  course,  the  final  projection  is  necessary  only  if  we 
absolutely  insist  on  a  result  in  M .  If  any  more  combining  is  to  be  done,  or  if  we 
are  reasonably  broad-minded,  intermediate  results  can  be  interpreted  directly  as  ele- 
ments in  M ' . 

3.  Spaces  of  Opinions  of  Experts 

In  this  section,  we  introduce  two  new  spaces,  based  on  the  opinions  of  sample 
spaces  of  experts,  and  discuss  the  evaluation  of  statistics  of  experts'  opinions. 
Finally,  we  interpret  the  combination  rules  in  these  spaces  as  being  a  form  of  Baye- 
sian  updating.  In  the  following  section  we  will  show  that  these  spaces  also  map 
homomorphically  onto  the  space  of  belief  states. 

3.1.   Opinions  of  Experts 

We  consider  a  set  £  of  "experts",  together  with  a  map  |jl  giving  a  weight  or 
strength  for  each  expert.  It  is  convenient  to  think  of  €  as  a  large  but  finite  set, 
although  the  essential  restriction  is  that  £  should  be  a  measure  space.  Each  expert 
to €5  maintains  a  list  of  possible  labels:  Dempster  uses  the  notation  r(a>)  for  this 
subset;  i.e.,  r(oo)CA.  Here  we  will  assume  that  each  expert  a>  has  more  than  just  a 
subset  of  possibilities  T(co),  but  also  a  probabilistic  opinion  p^  defined  on  A  satisfy- 
ing 

pw(X)>0,  VX€A 


and 


pw(X)>0iffX€r(co), 


(2>"(M  =  1     or    pw(X)  =  OVX),    Vw€5. 


As  suggested  by  the  notation,  /^(X)  represents  expert  to's  assessment  of  the  proba- 
bility of  occurrence  of  the  label  X.  If  an  expert  to  believes  that  a  label  X  is  possible, 
i.e.,  X€r(a>),  then  the  associated  probability  estimate  /?u,(X)  will  be  nonzero.  Con- 
versely, if  to  thinks  that  X  is  impossible  (\IY{(»)),  then  p ^(k)  -  0.  We  also 
include  the  possibility  that  expert  a>  has  no  opinion  which  is  indicated  by  the  special 
element  p ^  =  0.  This  state  is  included  in  order  to  ensure  that  the  binary  operation, 
to  be  defined  later,  is  closed.   We  denote  the  collection  of  maps  {p^  \  w££}by  P. 
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It  will  turn  out  that  the  central  point  in  the  theory  of  evidence  is  that  the  /?u,(X) 
data  is  used  only  in  terms  of  test  for  zero.    Specifically,  we  set 

*-(X)  -  I1  if^(M>0 
10  ifpm(\)  =  0. 

Note  that  xw  is  the  characteristic  function  of  the  set  T(u>)  over  A,  i.e., 
xu(\)  =  1  iff  \€r(to).  The  collection  of  all  Jtu's  will  be  denoted  by  X,  and  will  be 
called  the  boolean  opinions  of  the  experts  £. 

If  we  regard  the  space  of  experts  £  as  a  sample  space,  then  each  *q,(X)  can  be 
regarded  as  a  sample  of  a  random  (boolean)  variable  x(\).  In  a  similar  way,  the 
pu(X)'s  are  also  samples  of  random  variables  p  (X).  The  state  of  the  system  will  be 
defined  by  statistics  on  the  set  of  random  variables  {x(X)}xeA-  These  statistics  are 
measured  over  the  space  of  experts.  If  all  experts  have  the  same  opinion,  then  the 
state  should  describe  that  set  of  possibilities,  and  the  fact  that  there  is  a  unanimity  of 
opinion.    If  there  is  a  divergence  of  opinions,  the  state  should  record  the  fact. 

To  compute  statistics,  we  view  £  as  a  sample  space  with  prior  weights  given  by 
\l.  We  extend  p.  to  a  measure  on  £,  completely  determined  by  the  weights  of  the 
individual  experts  u.({o>})  for  o)€£.    (We  are  assuming  that  £  is  finite.)   That  is, 

nCH  =  2>(M). 

If  all  experts  have  equal  weights,  then  n  is  equivalent  to  a  counting  measure,  and 
statistics  are  then  measured  in  terms  of  percentages  of  experts.  For'minor  technical 
reasons  (explained  in  Section  4),  we  allow  weights  on  the  experts,  so  that  statistics 
on  the  x(X)'s  are  in  terms  of  weighted  percentages. 

We  are  now  ready  to  introduce  the  spaces  which  we  will  term  "opinions  of 
experts."  The  central  point  is  that  the  set  of  labels  A  is  fixed,  but  that  the  set  of 
experts  £  can  be  different  for  distinct  elements  in  these  spaces.  For  the  first  space, 
we  also  use  a  fixed  set  of  positive  constants  kx,  one  for  each  label  that  will  eventu- 
ally be  set  to  the  prior  probability  for  the  label  X. 

Definition  3:  Let  K  =  {k\}  be  a  set  of  positive  constants  indexed  over  the  label  set 
A.    The  space  of  probabilistic  opinions  of  experts  (Ar,Ar,®),  is  defined  by 

A'  =  {(£, (x,P)  |  #£  <  °°,  p  is  a  measure  on  £,   P  =  {pu>}u>(i£  , 
pw:A-[0,l]  Vco,    and  V go,     ^pm(k)  ^lorp^O}. 

As  noted  earlier,  the  requirement  that  #£  <  °°  is  for  clarity  of  presentation;  Demp- 
ster defines  the  space  A^in  a  more  general  setting. 

We  define  a  binary  operations  on  Ar  as  follows.  Given  (5i,|i«i,Pi)  and 
(£2.^2,^2)  elements  in  Ar,  define 

(5.|l,P)   =   (fl.lil.Pl)   0(^2,^2,^2) 

by 

S-  £\X-£i  =  {((1)1,02)  I  o)i €fi,  o)2 €£2}, 
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ji({(a>1,U)2)})   =   ^l({iOi})fJL2({w2}). 
and 

'      =    VP  (a)!  U)2)/((Di,(i>2)££    > 

\' 

providing  the  denominator  is  nonzero,  and 

P(»i,»2)    ~  ° 

otherwise.   Here,  P,  =  {pli- L,€£,  for  i  =  1,2,  and  the  Kx,'s  are  a  fixed  set  of  positive 
constants  defined  for  X€A.    ■ 

To  interpret  this  combining  operation,  consider  two  sets  of  experts  £\  and  £2, 
with  each  set  of  experts  expressing  opinions  in  the  form  of  Pi  and  P2.  We  form  a 
new  set  of  experts,  which  is  simply  the  set  of  all  committees  of  two,  consisting  of 
one  expert  from  £\,  and  another  from  <f2.  In  each  of  the  committees,  the  members 
confer  to  determine  a  consensus  opinion.  In  Section  3.3,  we  will  see  how  to  inter- 
pret the  formulas  as  Bayesian  combination  (where  kx  is  the  prior  probability  on  A.). 
And  in  the  following  section  we  will  show  that  this  space  maps  homomorphically 
onto  the  belief  spaces.  Finally,  if  as  in  Dempster  [5],  we  only  regard  the  opinions 
of  these  experts  in  terms  of  a  test  for  zero  (i.e.  disregarding  the  strength  of  nonzero 
opinions),  we  arrive  at  yet  another  space.  A  depiction  of  the  combination  of  two 
Boolean  opinions  is  shown  in  Figure  1. 

Definition  4:  The  space  of  boolean  opinions  of  experts,  (A'',  I),  is  defined  similarly: 
A''  =  {(£,|i,X)|  #£<  °°,  u.  is  a  measure  on  £, 

X  =  {xu}az£  ,   x„  :  A  -  {0,1}  Vu)}. 
If  (£1,  |Ai,Xi)  and  (<f2,  M-2,^2)  are  elements  in  A'',  define  their  product 

(£,  \i,X)  =  (5i.|ti,Xi)  Q(S2,\i2,X2) 

by 

£=  £\X-£2  =  {(01,(1)2)  I  o>i£<fi,  a>2€£2} 

u.({(a>i,a)2)})  =  u.i({a>i})-n2({co2}), 
and 

A    --   vX(<oli(1)2)/(a>i,<i>2)€£» 

*(W1,«*2)W  =  *8(x)  •*(§(*). 

where  X,  =  {x$  |a>,€<f,},  for  i  =  1,2.    ■ 
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(a)1,o)2) 


Figure  1.  A  depiction  of  the  combination  of  two  boolean  opinions  of  two  experts,  as  is 
present  in  combinations  in  M'1 ,  yielding  a  consensus  opinion  by  the  element  in  the  product  set 
of  experts  formed  by  the  committee  of  two. 


3.2.   Statistics  of  Experts 

For  a  given  subset  AC  A,  the  characteristic  function  Xa  is  defined  by 


Xa(X)  = 


if  \iA 
if  \€A. 


Equality  of  two  functions  defined  on  A  means,  of  course,  that  the  two  functions 
agree  for  all  \€A.   That  is,  x^  =  Xa  means 

x*W  =  XA(X)     V\£A, 

which  is  the  same  thing  as  saying  r(co)  =  A. 

Given  a  space  of  experts  fand  the  boolean  opinions  X,  we  define 

^^1^  =  Xa} 


m{A)  = 


*{$ 


(5) 


for  every  subset  ACA.  It  is  possible  to  view  the  values  as  probabilities  on  the  ran- 
dom variables  {x(A.)}.  We  endow  the  elements  of  S  with  the  prior  probabilities 
u,({o)})/p,(f ),  and  say  that  the  probability  of  an  event  involving  a  combination  of  the 
random  variables  x(A.)'s  over  the  sample  space  €  is  the  probability  that  the  event  is 
true  for  a  particular  sample,  where  the  sample  is  chosen  at  random  from  €  with  the 
sampling  distribution  given  by  the  prior  probabilities.   This  is  equivalent  to  saying 

_     .,_         x      u,({(o€<T  |  Event  is  true  for  to}) 

Prob(Event)  =  •EL^i ' rz. • 

£  l*W 

With  this  convention,  we  see  that 
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m(A)  =  Prob(*(X)  =  Xa(X)  f°r  all  X). 

£ 


In  fact,  all  of  the  priors  and  joint  statistics  of  the  x(X)'s  are  determined  by  the  full 
collection  of  m(A)  values.   For  example, 


and 


Prob(*(X0)  -  1  )  -       2     m(A) 

{A|X0€A} 


Prob(*(X0)  =  1    and  x(Xi)  =  1  )  =         2        m^A) . 

{A|Xo,Xi£A} 


Further,  the  full  set  of  values  m(A)  for  ACA  defines  an  element  mZM' .  To 
see  this,  it  suffices  to  check  that  ^m(A)  =  1,  which  amounts  to  observing  that  for 
every  it),  xw  =  Xa  for  some  ACA. 

Recalling  the  definition  of  V  (Equation  (3)),  we  may  also  consider  the  numbers 
(\m)(A).  These  values  can  also  be  interpreted  as  probabilities,  providing  we  define 
probability  in  a  way  which  ignores  experts  who  give  no  possibilities,  and  providing 
there  are  some  experts  who  give  some  possibilities,  (i.e.,  m(0)  #  1).  Then  for 
A#0, 

m(A)  =  (Vm)(A)    =        m[A^ 

l-m(0) 

is  the  probability  that  a  randomly  chosen  expert  co  will  state  that  the  subset  of  possi- 
bilities is  precisely  A  conditioned  on  the  requirement  that  the  expert  gives  at  least 
one  possibility. 

Under  the  assumptions  that  A  *  0,  m(0)  ¥=  1,  and  that  probability  is  meas- 
ured over  the  set  of  experts  expressing  an  opinion  <f'  =  {a>  \x^^  0},  many  of  the 
quantities  in  the  theory  of  evidence  can  be  interpreted  in  terms  of  familiar  statistics 
on  the  x(A.)'s.   For  example,  the  belief  on  a  set  A, 

Bel(A)  -    £m(B) 

BQA 

is  simply  the  joint  probability 

Bel(A)  =  Prob(;c(X)  =  0  for  \iA). 

£' 

Note  that  the  prior  probabilities  on  the  experts  in  £'  are  given  by  n,_({a>})/>(«f '). 
The  denominator  in  these  priors  is  nonzero  due  to  the  assumption  that  m(0)#l. 

In  a  similar  way,  plausibility  values 

P1(A)  =      2     m(B)  -  l-Bel(A) 

BC\A*0 

can  be  interpreted  as  disjunctive  probabilities 

P1(A)  -  Prob(x(\)  -  1  for  some  X€A). 

The  beliefs  and  plausibilities  are  the  lower  and  upper  probabilities  as  defined  by 
Dempster.   The  commonality  values 
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are  joint  probabilities: 


Q(A)  =    £m(5) 

AQB 


Q(A)  =  Prob(;t(X)  =  1  for  X€A). 

£' 


To  recapitulate,  we  have  defined  a  mapping  from  P  values  to  X  values,  and 
then  transformations  from  X  to  m  and  m  values.  The  resulting  element  m,  which 
contains  statistics  on  the  X  variables,  is  an  element  in  the  space  of  belief  states  M  of 
the  of  the  Dempster/Shafer  theory  of  evidence  (Section  2). 

3.3.  Bayesian  Interpretation 

We  now  interpret  the  manner  in  which  pairs  of  experts  achieve  a  consensus 
opinion.  We  will  show  that  the  combination  formulas  given  for  A^  and  Af'  are  con- 
sistent with  a  Bayesian  interpretation.   Our  treatment  is  standard. 

We  first  consider  the  combination  of  (£i,|Ai,Pi)  and  (£2.^2.^2)  in  Af.  We 
assume  that  the  experts  in  £j  have  available  to  them  information  Sj.  Note  that  all 
experts  in  a  given  set  of  experts  share  the  same  information.  The  information  Sj 
consists  of  boolean  predicates  constituting  evidence  about  the  labeling  situation.  For 
example,  in  a  medical  diagnosis  application,  Sj  might  consist  of  a  statements  about 
the  presence  or  absence  of  a  set  of  symptoms.  Each  set  of  experts  £j  deals  with  a 
different  set  of  symptoms. 

In  general,  the  information  sj  is  the  result  of  a  set  of  tests  having  boolean  out- 
comes. We  could  write  Sj  =  fj(a),  where  fj  represents  the  tests,  and  a  is  the 
current  situation  which  is  an  element  in  some  sample  space  of  labeling  problems 
ct€2.  Assuming  X  is  also  a  measure  space,  there  are  prior  probabilities  on  the 
information  coefficients: 

Prob(^)  =  Prob(/;(a)  =  Sj). 

There  are  also  prior  probabilities  on  the  true  label  X(ct)  for  labeling  situation  a, 
given  by 

Prob(X)  =  Prob(X(o)  =  X). 

Note  that  these  probabilities  are  not  measured  over  the  space  of  experts  S,  but 
instead  are  measured  over  the  collection  of  instances  2  of  the  labeling  problem. 
For  example,  in  a  medical  diagnosis  domain,  S  might  represent  the  set  of  all 
patients. 

For  ;'  =  1,2,  we  will  suppose  that  p^(X)  represents  expert  a>y's  estimate  of 

Prob(X|^), 

the  probability  (over  2)  that  X(a)  =  X  conditioned  on  fj(cr)  =  Sj.  The  "expert" 
(a*!, 0)2)  should  then  estimate  Prob(X  |^  1,^2).  which  is  the  probability  that  X(cr)  =  X 
given  that  /i(o)  =  s\  and  /2(c)  —  ^2.  tnus  combining  the  two  bodies  of  evidence 
seen  by  the  two  experts  in  that  committee.   This  committee  proceeds  as  follows: 

Bayes'  formula  implies  that 
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Prob(\)-Prob(ji.j2|M         Prob( X) Prob(.y 1 1 X)  Prob(j?2  \s  1 ,  X) 
ProbCXl,!,^)  =  Prob(Jlf,2)  "  Prob(ii.ia) 

Applying  Bayes'  formula  to  Prob(si  |X),  this  becomes 

Prob(si)  . 

— ±~Prob(X  5i)-Prob(*2Ui,M  (6) 

Prob(si,s2) 

At  this  point  that  we  assume  that 

Prob(j2kiA)  =  Prob(52|X).  (7) 

Using  this  assumption,  we  obtain  by  combining  (6)  and  (7),  and  applying  Bayes' 
formula  to  Prob(s2|X), 

Prob(\|ji)-Prob(\|j2) 

Prob(X|5i,52)  =  c(si,s2) ProbTx) ' 

where  c(si,s2)  is  a  constant  independent  of  X.  Using  Equation  (8),  expert  (co1,o)2) 
estimates  that 

pffiOQpjgOO  ... 

P((o1o.2)(^)    =    c(Si,S2)-  "  •  W 

KX 

based  on  the  independence  assumption  (7),  where  kx  =  Prob(X).  Since  the  left  hand 
side  of  this  equation  should  sum  to  1  over  X,  we  have  that 

C(SUS2)  =    SpSCX'^M'Mkx']-1'  (10) 

x' 

unless,  of  course,  this  denominator  is  zero,  in  which  case  we  resort  to  setting 
P  (ioioo2)  —  0.  Combining  (9)  and  (10)  gives  the  combination  formula  given  in  Defini- 
tion 3.  Thus,  we  have  shown  that  combination  in  A'  is  a  form  of  Bayesian  updating 
of  pairs  of  experts,  based  on  an  independence  assumption. 

To  interpret  the  combination  formula  of  Kf'  in  a  Bayesian  fashion,  a  weaker 
independence  assumption  suffices.   The  combination  formula  can  be  restated  as: 

xiwi,„2)W  =  0   iff  x$(X)  =  0   or   xg(X)  =  0. 

Using  Bayes'  formula,  and  assuming  that  all  prior  probabilities  are  nonzero,  it  suf- 
fices to  show  that 

Prob(^i,52|X)  =  0   iff  Prob(5i|X)  =  0   or   Prob(s2|X)  =  0. 

The  "if"  part  follows  since 

Prob(*i,j2|X)  =  Prob(ji|X)-Prob(s2|si,X) 

=  Prob(52|X)Prob(jiU2,X). 

The  "only  if"  part  becomes  our  independence  assumption,  and  is  equivalent  to 

ProbUi|X)  >  0   and   Prob(^2  |X)>0  =5>  Prob(*i,s2  |X)>0.  (11) 

This  assumption  is  implied  by  our  earlier  hypothesis  (7).  However,  assumption  (11) 
is  more  defensible,  and  is  actually  all  that  is  needed  to  regard  updating  in  the  space 
of  "boolean  opinions  of  experts,"  Ar' ,  as  Bayesian.     Since   the  Dempster/Shafer 
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theory  deals  only  with  the  boolean  opinions,  Equation  (11)  is  the  required  indepen- 
dence assumption. 

4.   Equivalence  with  the  Dempster/Shafer  Rule  of  Combination 

At  this  point,  we  have  four  spaces  with  binary  operations,  namely  (A/",®), 
(A/"',©),  (M ',©'),  and  (M,®)-  We  will  now  show  that  these  four  spaces  are 
closely  related.  It  is  not  hard  to  show  that  the  binary  operation  is,  in  all  four  cases, 
commutative  and  associative,  and  that  each  space  has  an  identity  element,  so  that 
these  spaces  are  abelian  monoids.   We  also  have 

Definition  5:  The  map  T 

T-.A^Af'  , 

with  (f.fJL.X)  =  T(S,p.,P),  is  given  by  equation  (4),  i.e.,  xw(X)  =  1  iff  pw(X)>0, 
and  xw(k)  =  0  otherwise.    ■ 

There  is  another  mapping  U,  given  by 

Definition  6: 

U  :  AT'  -  M' 

with  m  =  U(£,jjl,X)  given  by  equation  (5),  i.e., 

m(A)  =  ji({o>€fUw  =  XA})/fi({5})  .■ 

We  will  show  that  T  and  U  preserve  the  binary  operations.    More  formally,  we 
show  that  T  and  U  are  homomorphisms  of  monoids. 

Lemma  2:    T  is  a  homomorphism  from  Aronto  Ar' . 
Proof:    It  is  a  simple  matter  to  verify  that 

T(£i,Pi)  ©  T(£2,P2)  =  T((£i,Pi)  ©  (£2,P2)). 

The  essential  point,  it  turns  out,  is  that  since  the  probabilistic  opinions  are  all  non- 
negative, 

P$M  -pgOO  >0   iff  PL11)(M>0andp£)(M>0. 

T  is  easily  seen  to  be  onto.   ■ 

Lemma  3:  U  is  a  homomorphism  of  A*'  onto  M' . 

Proof:  Consider  (£,\l,X)  =  (fi.M-l.-^i)  ©(^2.^2,^2)-  For  each  co^fi  and  u>2€£2, 
the  corresponding  x$  and  x$  are  characteristic  functions  of  subsets  of  A,  say  Xb 

and  Xc  respectively.    It  is  clear  that 

xW-x$=Xa   iff  BHC  =  A. 
Thus 

X(W1>W2)  =  XA    iff  *$  -  Xfl    and   xg  =  Xc    where  5HC  =  A. 
So 
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{(a»l,««2)€f  |  X(„1>M2)  =  XA}  =      (J     {«i|xg  =  Xfl}x{co2|xg  =  Xc} 

Bnc=A 

Since  this  is  a  disjoint  union,  using  properties  of  measures,  this  gives 

snc=A 
We  can  divide  both  sides  of  this  equation  by  \i.{£)  =  fM{£l}'M-2{£2/  to  obtain 

m(A)  =       2     mi(S)m2(C). 

Bnc=A 

where  m  =  U(£,fi,X),  and  mi  =  UCf.-.u-.-.X,),  i  =  l,2.   Thus 

U((fi,u.i>Xi):.(^2,u,2,X2))  =  U(£i,u.i,Xi)®'U(£2,u.2,X2), 

which  is  to  say  that  U  is  a  homomorphism. 

Finally,  we  show  that  U  is  onto.  Recall  that  there  are  n  elements  in  A,  and  so 
there  are  2"  different  subsets  of  A.  For  a  given  mass  distribution  m€M.',  consider 
a  set  of  2"  experts  £,  with  each  expert  u>€i"  giving  a  distinct  subset  T(co)CA  as  the 
set  of  possibilities.  If  we  give  expert  a)  the  weight  u,{co}  =  m(r(u>)),  and  set 
xu>  =  Xr(u))>  tnen  it  is  easy  to  see  that  m  =  U(£,u.,X).  ■ 

In  the  immediately  preceding  proof  that  U  is  onto,  we  assigned  weights  to 
experts.  This  is  the  only  place  were  we  absolutely  require  the  existence  of  differen- 
tial weights  on  experts.  However,  if  we  content  ourselves  to  spaces  M'  and  M 
containing  only  rational  values  for  the  mass  distribution  functions  (as,  for  example, 
is  the  case  in  any  computer  implementation),  then  the  weights  can  be  eliminated, 
and  replaced  by  counting  measure. 

Recall  from  Section  2  that  the  map  V:M'—M  is  also  a  homomorphism.  So  we 
can  compose  the  homomorphisms  T:A'-A''  with  U:A''-AV  with  Y:M'-M  to  obtain 
the  following  obvious  theorem. 

Theorem:  The  map  V°\J°T:.\'-M  is  a  homomorphism  of  monoids  mapping  onto  the 
space  of  belief  states  (M,Q>).   ■ 

This  theorem  provides  the  justification  for  the  viewpoint  that  the  theory  of  evidence 
space  M  represents  the  space  A'  via  the  representation  V°U°T.  The  proof  follows 
from  the  lemmas;  since  each  of  the  component  maps  in  this  representation  is  an  onto 
homomorphism,  the  composition  also  maps  homomorphically  onto  the  entire  theory 
of  evidence  space. 

The  significance  of  this  result  is  that  we  can  regard  combinations  of  elements  in 
the  theory  of  evidence  as  combinations  of  elements  in  the  space  of  opinions  of 
experts.  For  if  mi,  •  ■  ■  ,mk  are  elements  in  M  which  are  to  be  combined  under  <f, 
we  can  find  respective  preimages  in  A'  under  the  map  V°U°T,  and  then  combine 
those  elements  using  the  operation  £  in  the  space  of  opinions  of  experts  A*.  After 
all  combinations  in  A'  are  completed,  we  project  back  to  M  by  V°U°T;  the  result 
will  be  the  same  as  if  we  had  combined  the  elements  in  M .  The  only  advantage  to 
this  procedure  is  that  combinations  in  A' are  conceptually  simpler:  we  can  regard  the 
combination  as  Bayesian  updatings  on  the  product  space  of  experts. 
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5.   An  Alternative  Method  for  Combining  Evidence 

With  the  viewpoint  that  the  theory  of  evidence  is  really  simply  statistics  of 
opinions  of  experts,  we  can  make  certain  remarks  on  the  limitations  of  the  theory. 

(1)  There  is  no  use  of  probabilities  or  degrees  of  confidence.  Although  the 
belief  values  seem  to  give  weighted  results,  at  the  base  of  the  theory  experts 
only  say  whether  a  condition  is  possible  or  not.  In  particular,  the  theory 
makes  no  distinction  between  an  expert's  opinion  that  a  label  is  likely  or  that 
it  is  remotely  possible. 

(2)  Pairs  of  experts  combine  opinions  in  a  Bayesian  fashion  with  independence 
assumptions  of  the  sources  of  evidence.  In  particular,  dependencies  in  the 
sources  of  information  are  not  taken  into  account. 

(3)  Combinations  take  place  over  the  product  space  of  experts.  It  might  be 
more  reasonable  to  have  a  single  set  of  experts  modifying  their  opinions  as 
new  information  comes  in,  instead  of  forming  the  set  of  all  committees  of 
mixed  pairs. 

Both  the  second  and  third  limitations  come  about  due  to  the  desire  to  have  a 
combination  formula  which  factors  through  to  the  statistics  of  the  experts  and  is 
application-independent.  The  need  for  the  second  limitation,  the  independence 
assumption  on  the  sources  of  evidence,  is  well-known  (see,  e.g.,  [29]).  Without 
incorporating  much  more  complicated  models  of  judgements  under  multiple  sources 
of  knowledge,  we  can  hardly  expect  anything  better. 

The  first  objection,  however,  suggests  an  alternate  formulation  which  makes 
use  of  the  probabilistic  assessments  of  the  experts.  Basically,  the  idea  is  to  keep 
track  of  the  density  distributions  of  the  opinions  in  probability  space.  Of  course, 
complete  representation  of  the  distribution  would  amount  to  recording  the  full  set  of 
opinions  p m  for  all  to.  Instead,  it  is  more  reasonable  to  approximate  the  distribution 
by  some  parameterization,  and  update  the  distribution  parameters  by  combination 
formulas. 

We  present  a  formulation  based  on  normal  distributions  of  logarithms  of  updat- 
ing coefficients.  Other  formulations  are  possible.  In  marked  contrast  to  the 
Dempster/Shafer  formulation,  we  assume  that  all  opinions  of  all  experts  are  nonzero 
for  every  label.  That  is,  instead  of  converting  opinions  into  boolean  statements  by 
test  for  zero,  we  will  assume  that  all  the  values  are  nonzero,  and  model  the  distribu- 
tion of  their  strengths. 

A  simple  rewrite  of  Equation  (8)  of  Section  3.3  yields 

.  Prob(XUi)    Prob(\|52) 

Prob(X|f1(ia)  =  c(jli*2)-P«>b(X).    prob(X)    •    prob(X)    • 

This  equation  depends  on  an  independence  assumption,  Equation  (7).  We  can 
iterate  this  equation  to  obtain  a  formula  for  Prob(X|^i,  ■  ■  ■  ,5*).  In  this  iteration 
process,  s\  and  S2  successively  take  the  place  of  ^iA  •  •  •  /\$,-  and  j1  +  i  respectively, 
as  i  increases  from  1  to  it  — 1.  Accordingly,  we  require  a  sequence  of  independence 
assumptions,  which  will  take  the  form 

Prob(j,-+i|.TiA  •  ■     Asj.^X)  =  Prob(s,|X.) 

for  i  —  1,  •  •  ■  ,it.    Under  these  assumptions,  we  obtain 
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*   Prob(X|j,-) 
Prob(X|*i,  •  ■  •  ,sk)  =  c(su  ■  ■  ■  ,J*)-Prob(X)-n 


In  a  manner  similar  to  [3],  set 

L(\\Si)  =  log 


•Vj    Prob(X) 
Prob(X|5,)n 


Prob(X) 


(Note,  incidentally,  that  these  values  are  not  the  so-called  "log-likelihood  ratios";  in 
particular,  the  L(X|5,)'s  can  be  both  positive  and  negative).   We  then  obtain 

k 

log[Prob(X|ji,  •  •  •  ,sk)]  -  c  +  logtProb(X)]  +  ^L(\\Si), 

i  =  i 

where  c  is  a  constant  independent  of  X  (but  not  of  s\,  •  ■  ■  ,sk). 

The  consequence  of  this  formula  is  that  if  the  independence  assumptions  hold, 
and  if  Prob(X)  and  L(X|.y,)  are  known  for  all  X  and  i,  then  the  values 
Prob(X|5i,  •  •  •  ,sk)  can  be  calculated  from 

Prob(X)-exp[X^(Xk,)] 

Prob(X|*i,  •  •  •  ,sk)  =  ~ .  (12) 

2Prob(X')exp[2L(X'|*)] 

X'  i  =  l 

Accordingly,  we  introduce  a  space  which  we  term  "logarithmic  opinions  of 
experts."  For  convenience,  we  will  assume  that  experts  have  equal  weights.  An 
element  in  this  space  will  consist  of  a  set  of  experts  £,-,  and  a  collection  of  opinions 
Yi  =  {y$}  c£  .   Each  y$  is  a  map,  and  the  component  yiPCX)  represents  expert  co's 

estimate  of  L(X|^,): 

y«  :A-R,     y!P(X)  ~L(X|s,)  . 

Note  that  the  experts  in  <f,-  all  have  knowledge  of  the  information  sit  and  that  the 
estimated  logarithmic  coefficients  L(X|j'l)  can  be  positive  or  negative.  In  fact,  since 
the  experts  do  not  necessarily  have  precise  knowledge  of  the  value  of  Prob(X),  but 
instead  provide  estimates  of  log's  of  ratios,  the  estimates  can  lie  in  an  unbounded 
range. 

In  analogy  with  our  map  to  a  statistical  space  (Section  3.2),  we  can  define  a 
space  which  might  be  termed  the  "parameterized  statistics  of  logarithmic  opinions  of 
experts."  Elements  in  this  space  will  consist  of  pairs  (w,C),  where  u  is  in  IR"  and  C 
is  a  symmetric  n  by  n  matrix.  We  next  describe  how  to  project  from  the  space  of 
logarithmic  opinions  to  the  space  of  parameterized  statistics. 

Let  us  suppose  that  for  a  set  of  experts  £,  and  for  A  =  {X!,  ■  •  •  ,X„},  the  n- 
vectors  composed  of  the  logarithmic  opinions  y";,, €IR",  y o,  =  (yc^Xi),  •  ■  ■  ,yaj(Xn)), 
are  approximately  (multi-)  normally  distributed.  Thus  we  model  the  distribution  of 
the  random  vector  y  =  (y(\\),  ■  ■  ■  ,y(X„))  by  the  density  function 

m(y)  =  *    . exp((y-»)rC-1(y-^)),     y€R", 

(2ir)n/2VdetC 

where  m€R"  is  the  mean  of  the  distribution,  and  C  is  the  n  by  n  covariance  matrix. 
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That  is,  in  terms  of  the  expectation  operator  E{-}  on  random  variables  over  the  sam- 
ple space  £ , 

u  =  («i,  ■  •  •  ,un), 

ut  =  E{y(X,)}, 

and  for  C  =  (c,j), 

Cij  =  E{(y(\,)  -  Ui)(y(\j)  -  uj)}. 

These  measurements  of  the  statistics  of  the  y(X)'s  can  be  made  regardless  of  the 
true  distributions.  The  accuracy  of  the  model  depends  on  the  degree  to  which  the 
multinormal  distribution  assumption  is  valid. 

Next  we  discuss  combination  formulas  in  both  spaces.  Suppose  (£,-,y,), 
i  =  1,2,  are  two  elements  in  the  space  of  logarithmic  opinions,  each  describing  a 
sample  space  of  experts  together  with  opinions.  Since  according  to  Equation  (12), 
the  logarithmic  opinions  add,  we  define  the  combination  of  the  two  elements  by 
(£,Y),  where 

€  =  £\  x  £2 » 

*     ~    \y  ((1)1,(02)/  (o)i,U);)€f » 

y(M1,«2)(X)  =  y$M  +yg(X). 


he  space  of  statistics,  let  m,(y)  be  the  density 
vector  y       over  the  sample  space  £,-,  i  =  1,2. 


To  consider  combinations  in  the 
function  over  1R"  for  the  random 
Assume  that  each  m(  is  a  multinormal  distribution,  associated  with  a  mean  vector 
17  and  a  covariance  C®.  In  order  that  the  projection  to  the  space  of  statistics  be  a 
homomorphism,  the  definition  of  combination  in  the  space  of  statistics  should 
respect  the  true  statistics  of  the  combined  opinions.  The  density  function  m(y)  for 
the  combination  y^uU>2),  (0)1,(02)  €£,  is  given  by 

m(y)  =   Jm1(y')m2(.y-y')dy'. 

IR" 

This  is  the  point  where  we  use  the  fact  that  the  logarithmic  opinions  add  under  com- 
bination. 

Projecting  to  the  space  of  statistics,  we  discover  the  advantage  of  modeling  the 
distributions  by  normal  functions.  Namely,  since  the  convolution  of  a  Gaussian  by  a 
Gaussian  is  once  again  a  Gaussian,  we  define  the  combination  formula 

(«(1),c<»)  e(u™,cW)  =  («(1)+S(2),c«+c(2)). 

That  is,  since  m\  and  mi  art'  multinormal  distributions,  their  convolution  is  also 
multinormal  with  mean  and  covariance  which  are  the  sums  of  the  contributing 
means  and  covariances.  (This  result  is  easily  proven  using  Fourier  transforms.)  An 
extension  to  the  case  where  €\  and  £2  have  nonequal  total  weights  is  straight- 
forward . 

Having  defined  combination  in  the  space  of  statistics,  one  must  show  that  the 
transformation    from    the    space    of    opinions    to    the    space    of    statistics    is    a 
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homomorphism,  even  when  the  logarithmic  opinions  are  not  truly  normally- 
distributed.  This  is  easily  done,  since  the  means  and  covariances  of  the  sum  of  two 
random  vectors  are  the  sums  of  the  means  and  covariances  of  the  two  random  vec- 
tors. 

To  interpret  a  state  (w,C)  in  the  space  of  parameterized  statistics,  we  must 
remember  the  origin  of  the  logarithmic-opinion  values.  Specifically,  after  k  updat- 
ing iterations  combining  information  s\  through  5*,  the  updated  vector 
y  =  (>'i.  '  '  '  »y»)€lR.*  is  an  estimate  of  the  sum  of  the  logarithmic  coefficients, 

yj   *    ^L(\\Sj). 
j  =  i 

According  to  Equation  (12),  the  a  posteriori  probabilities  can  then  be  calculated 
from  this  estimate  (providing  the  prior  Prob(\)'s  are  known).  In  particular,  the  a 
posteriori  probability  of  a  label  X;  is  high  if  the  corresponding  coefficient 
yy  +  log[Prob(\7)]  is  large  in  comparison  to  the  other  components  yj  +  log[Prob(\;)]. 

Since  the  state  (u,C)  represents  a  multinormal  distribution  in  the  log-updating 
space,  we  can  transform  this  distribution  to  a  density  function  for  a  posteriori  pro- 
babilities. Basically,  a  label  will  have  a  high  probability  if  u,  +  log[Prob(\/-)]  is  rela- 
tively large.  However,  the  components  of  u  represent  the  center  of  the  distribution 
(before  bias  by  the  priors).  The  spread  of  the  distribution  is  given  by  the  covari- 
ance  matrix,  which  can  be  thought  of  as  defining  an  ellipsoid  in  1"  centered  at  u. 
The  exact  equation  of  the  ellipse  can  be  written  implicitly  as: 

(y-u)TC-\y-u)  =  1. 

This  ellipse  describes  a  "one  sigma"  variation  in  the  distribution,  representing  a 
region  of  uncertainty  of  the  logarithmic  opinions;  the  distribution  to  two  standard 
deviations  lies  in  a  similar  but  enlarged  ellipse.  The  eigenvalues  of  C  give  the 
squared  lengths  of  the  semi-major  axes  of  the  ellipse,  and  are  accordingly  propor- 
tional to  degrees  of  confidence.  The  eigenvectors  give  the  directions  in  which  the 
eigenvalues  measure  their  uncertainty.  Bias  by  the  prior  probabilities  simply  adds  a 
fixed  vector,  with  components  log[Prob(\,)],  to  the  ellipse,  thereby  translating  the 
distribution.  We  seek  an  axis  ;'  such  that  the  components  yj  of  the  vectors  y  lying  in 
the  translated  ellipse  are  relatively  much  larger  than  other  components  of  vectors  in 
the  ellipse.    In  this  case,  the  preponderant  evidence  is  for  label  Kj. 

For  example,  in  a  three-label  case,  we  might  have  priors  of  approximately 
(.01,.  19, .8),  and  evidence  with  the  following  means  and  covariances  in  log- 
probability  space  of 

«l  -  (  1.  ,0.  ,  -.01  ) 


c\ 


r.5     0 

tf 

0   .5 

0 

,0     0 

.001, 

and 


u2  =  (  .4,  -.1  ,  -.2) 
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'2. 

0 

tf 

0 

.05 

0 

.0 

0 

.1, 

ci  = 

Then  adding  means  and  covariances,  and  using  Equation  (12)  to  reinterpret  in  terms 
of  probabilities,  we  come  up  with  a  current  estimated  probability  distribution 
(.64, .08, .28)  but  with  a  large  uncertainty  region.  For  example,  within  a  one-sigma 
displacement  from  the  mean  opinion,  we  have  the  distribution  (.13, .18, .69).  We 
conclude  that  the  evidence  tends  to  indicate  that  label  2  is  probable,  but  there  is  con- 
siderable uncertainty. 

Clearly,  the  combination  formula  is  extremely  simple.  Its  greatest  advantage 
over  the  Dempster/Shafer  theory  of  evidence  is  that  only  0(n2)  values  are  required 
to  describe  a  state,  as  opposed  to  the  2"  values  used  for  a  mass  distribution  in  M. 
The  simplicity  and  reduction  in  numbers  of  parameters  has  been  purchased  at  the 
expense  of  an  assumption  about  the  kinds  of  distributions  that  can  be  expected. 
However,  the  same  assumption  allows  us  to  track  probabilistic  opinions  (or  actually, 
the  logarithms),  instead  of  converting  all  opinions  into  boolean  statements  about 
possibilities. 

6.   Conclusions 

We  have  shown  how  the  theory  of  evidence  may  be  viewed  as  a  representation 
of  a  space  of  opinions  of  experts,  where  opinions  are  combined  in  a  Bayesian 
fashion  over  the  product  space  of  experts.  (Refer  to  Figure  2.)  By  "representa- 
tion", we  mean  something  very  specific  —  namely,  that  there  is  a  homomorphism 
mapping  from  the  space  of  opinions  of  experts  onto  the  Dempster/Shafer  theory  of 
evidence  space.  This  map  fails  to  be  an  isomorphism  (which  would  imply 
equivalence  of  the  spaces)  only  insofar  as  it  is  many-to-one.  That  is,  for  each  state 
in  the  theory  of  evidence,  there  is  a  collection  of  elements  in  the  space  of  opinions 
of  experts  which  all  map  to  the  single  state.  In  this  way  the  state  in  the  theory  of 
evidence  represents  the  corresponding  collection  of  elements.  In  fact,  what  this  col- 
lection of  elements  have  in  common  is  that  the  statistics  of  the  opinions  of  the 
experts  defined  by  the  element  are  similar,  in  terms  of  the  way  statistics  are  meas- 
ured by  the  map  U. 

Furthermore,  combination  in  the  space  of  opinions  of  experts,  as  defined  in 
Section  3,  leads  to  combination  in  the  theory  of  evidence  space.  This  allows  us  to 
implement  combination  in  a  somewhat  simpler  manner,  since  the  formulas  for  com- 
bination without  the  normalization  are  simpler  than  the  more  standard  formulas, 
and  also  permits  us  to  view  combination  in  the  theory  of  evidence  space  as  the 
tracking  of  statistics  of  opinions  of  experts  as  they  combine  information  in  a  pair- 
wise  Bayesian  fashion  over  the  product  space  of  experts.  Applying  a  Bayesian 
interpretation  to  the  updating  of  the  opinions  of  experts  also  makes  clear  the  implicit 
independence  assumptions  which  must  exist  in  order  to  combine  evidence  in  the 
prescribed  manner. 

From  this  viewpoint,  we  can  see  how  the  Dempster/Shafer  theory  of  evidence 
accomplishes  its  goals.  Degrees  of  support  for  a  proposition,  belief,  and  plausibili- 
ties, are  all  measured  in  terms  of  joint  and  disjunctive  probabilities  over  a  set  of 
experts  who  are  naming  possible  labels  given  current  information.    The  problem  of 
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ambiguous  knowledge  versus  uncertain  knowledge,  which  is  frequently  described  in 
terms  of  "withholding  belief,"  can  be  viewed  as  two  different  distributions  of  opin- 
ions. In  particular,  ambiguous  knowledge  can  be  seen  as  observing  high  densities  of 
opinions  on  particular  disjoint  subsets,  whereas  uncertain  knowledge  corresponds  to 
unanimity  of  opinions,  where  the  agreed  upon  opinion  gives  many  possibilities. 
Finally,  instead  of  performing  Bayesian  updating,  a  set  of  values  are  updated  in  a 
Bayesian  fashion  over  the  product  space,  which  results  in  non-Bayesian  formulas 
over  the  space  of  labels. 

In  meeting  each  of  these  goals,  the  theory  of  evidence  invokes  compromises 
that  we  might  wish  to  change.  For  example,  in  order  to  track  statistics,  it  is  neces- 
sary to  model  the  distribution  of  opinions.  If  these  opinions  are  probabilistic  assign- 
ments over  the  set  of  labels,  then  the  distribution  function  will  be  too  complicated  to 
retain  precisely.  The  Dempster/Shafer  theory  of  evidence  solves  this  problem  by 
simplifying  the  opinions  to  boolean  decisions,  so  that  each  expert's  opinion  lies  in  a 
space  having  2"  elements.  In  this  way,  the  full  set  of  statistics  can  be  specified  using 
2"  values.  We  have  suggested  an  alternate  method,  which  retains  the  probability 
values  in  the  opinions  without  converting  them  into  boolean  decisions,  and  requires 
only  0(n2)  values  to  model  the  distribution,  but  fails  to  retain  full  information 
about  the  distribution.  Instead,  our  method  attempts  to  approximate  the  distribution 
of  opinions  with  a  Gaussian  function. 
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